Search CORE

181 research outputs found

Practical lower and upper bounds for the Shortest Linear Superstring

Author: Cazaux B.
Juhel Samuel
Rivals Eric
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2018
Field of study

Peer reviewe

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

Convergence of the Number of Period Sets in Strings

Author: Rivals Eric
Sweering Michelle
Wang Pengfei
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

Consider words of length n. The set of all periods of a word of length n is a subset of {0, 1, 2, . . ., n−1}. However, any subset of {0, 1, 2, . . ., n−1} is not necessarily a valid set of periods. In a seminal paper in 1981, Guibas and Odlyzko proposed to encode the set of periods of a word into an n long binary string, called an autocorrelation, where a one at position i denotes the period i. They considered the question of recognizing a valid period set, and also studied the number of valid period sets for strings of length n, denoted κn. They conjectured that ln(κn) asymptotically converges to a constant times ln2(n). Although improved lower bounds for ln(κn)/ln2(n) were proposed in 2001, the question of a tight upper bound has remained open since Guibas and Odlyzko’s paper. Here, we exhibit an upper bound for this fraction, which implies its convergence and closes this longstanding conjecture. Moreover, we extend our result to find similar bounds for the number of correlations: a generalization of autocorrelations which encodes the overlaps between two strings

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

DNA Slippage Occurs at Microsatellite Loci without Minimal Threshold Length in Humans: A Comparative Genomic Approach

Author: Jarne Philippe
Leclercq Sébastien
Rivals Eric
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The dynamics of microsatellite, or short tandem repeats (STRs), is well documented for long, polymorphic loci, but much less is known for shorter ones. For example, the issue of a minimum threshold length for DNA slippage remains contentious. Model-fitting methods have generally concluded that slippage only occurs over a threshold length of about eight nucleotides, in contradiction with some direct observations of tandem duplications at shorter repeated sites. Using a comparative analysis of the human and chimpanzee genomes, we examined the mutation patterns at microsatellite loci with lengths as short as one period plus one nucleotide. We found that the rates of tandem insertions and deletions at microsatellite loci strongly deviated from background rates in other parts of the human genome and followed an exponential increase with STR size. More importantly, we detected no lower threshold length for slippage. The rate of tandem duplications at unrepeated sites was higher than expected from random insertions, providing evidence for genome-wide action of indel slippage (an alternative mechanism generating tandem repeats). The rate of point mutations adjacent to STRs did not differ from that estimated elsewhere in the genome, except around dinucleotide loci. Our results suggest that the emergence of STR depends on DNA slippage, indel slippage, and point mutations. We also found that the dynamics of tandem insertions and deletions differed in both rates and size at which these mutations take place. We discuss these results in both evolutionary and mechanistic terms

CRAC: an integrated approach to the analysis of RNA-seq reads

Author: Commes Thérèse
Philippe Nicolas
Rivals Eric
Salson Mikaël
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2013
Field of study

International audienceA large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr

HAL - Lille 3

Springer - Publisher Connector

INRIA a CCSD electronic archive server

HAL-Inserm

PubMed Central

HAL Descartes

Computing Phylo-k-mers

Author: Linard Benjamin
Pardi Fabio
Rivals Eric
Romashchenko Nikolai
Publication venue
Publication date: 19/09/2022
Field of study

Phylogenetically informed k-mers, or phylo-k-mers for short, are k-mers that are predicted to appear within a given genomic region at predefined locations of a fixed phylogeny. Given a reference alignment for this genomic region and assuming a phylogenetic model of sequence evolution, we can compute a probability score for any given k-mer at any given tree node. The k-mers with sufficiently high probabilities can later be used to perform alignment-free phylogenetic classification of new sequences-a procedure recently proposed for the phylogenetic placement of metabarcoding reads and the detection of novel virus recombinants. While computing phylo-k-mers, we need to consider large numbers of k-mers at each tree node, which warrants the development of efficient enumeration algorithms. We consider a formal definition of the problem of phylo-k-mer computation: How to efficiently find all k-mers whose probability lies above a user-defined threshold for a given tree node? We describe and analyze algorithms for this problem, relying on branch-and-bound and divideand-conquer techniques. We exploit the redundancy of adjacent windows of the alignment and the structure of the probability matrix to save on computation. Besides computational complexity analyses, we provide an empirical evaluation of the relative performance of their implementations on real-world and simulated data. The divide-and-conquer algorithms, which to the best of our knowledge are novel, are found to be clear improvements over the branch-and-bound approach, especially when a large number of phylo-k-mers are found

arXiv.org e-Print Archive

Accurate self-correction of errors in long reads using de Bruijn graphs

Author: Rivals Eric
Salmela Leena
Ukkonen Esko
Walve Riku
Publication venue
Publication date: 01/01/2016
Field of study

Peer reviewe

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

PubMed Central

Helsingin yliopiston digitaalinen arkisto

LoRDEC : accurate and efficient long read error correction

Author: Rivals Eric
Salmela Leena
Publication venue
Publication date: 01/01/2014
Field of study

Peer reviewe

INRIA a CCSD electronic archive server

PubMed Central

Helsingin yliopiston digitaalinen arkisto

A Fast and Specific Alignment Method for Minisatellite Maps

Author: Buard Jérôme
Bérard Sèverine
Gascuel Olivier
Nicolas François
Rivals Eric
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

Background: Variable minisatellites count among the most polymorphic markers of eukaryotic and prokaryotic genomes. This variability can affect gene coding regions, like in the prion protein gene, or gene regulation regions, like for the cystatin B gene, and be associated or implicated in diseases: the Creutzfeld-Jakob disease and the myoclonus epilepsy type 1, for our examples. When it affects neutrally evolving regions, the polymorphism in length (i.e. in number of copies) of minisatellites proved useful in population genetics. Motivation: In these tandem repeat sequences, different mutational mechanisms let the number of copies, as well as the copies themselves, vary. Especially, the interspersion of events of tandem duplication/contraction and of punctual mutation makes the succession of variant repeat much more informative than the sole allele length. To exploit this information requires the ability to align minisatellite alleles by accounting for both punctual mutations and tandem duplications. Results: We propose a minisatellite maps alignment program that improves on previous solutions. Our new program is faster, simpler, considers an extended evolutionary model, and is available to the community. We test it on the data set of 609 alleles of the MSY1 (DYF155S1) human minisatellite andconfirm its abilityto recover known evolutionary signals. Our experiments highlight that the informativeness of minisatellites resides in their length and composition polymorphisms. Exploiting both simultaneously is critical to unravel the implications of variable minisatellites in the control of gene expression and diseases. Availability: Software is available at http://atgc.lirmm.fr/ms_align/ Keywords: VNTR, tandem repeat, tandem duplication, variable costs, dynamic programming, sequence comparison

CiteSeerX

Directory of Open Access Journals

PubMed Central

ProdInra